テクニカルレポート | GRACEセンター

(1)

ISSN 1884-0760

GRACE TECHNICAL REPORTS

Bidirectionalizing Structural Recursion on

Graphs

Soichiro Hidaka Zhenjiang Hu Kazuhiro Inaba

Hiroyuki Kato Kazutaka Matsuda Keisuke Nakano

GRACE-TR 2009–03 August 2009

CENTER FOR GLOBAL RESEARCH IN

(2)

(3)

Bidirectionalizing Structural Recursion on Graphs

Soichiro Hidaka Zhenjiang Hu Kazuhiro Inaba Hiroyuki Kato National Institute of Informatics

2-1-2 Hitotsubashi, Chiyoda-ku Tokyo 101-8430, Japan {hidaka,hu,kinaba,kato}@nii.ac.jp

Kazutaka Matsuda

The University of Tokyo/JSPS Research Fellow 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan

[email protected] Keisuke Nakano

The University of Electro-Communications 1-5-1 Chofugaoka, Chofu-shi

Tokyo 182-8585, Japan [email protected]

August 31, 2009

Abstract

(4)

put

modify

t’

get

t

s’

s

Figure 1: Bidirectional Transformation

1 Introduction

Bidirectional transformation (Foster et al. 2005; Czarnecki et al. 2009) is characterized by a pair of transformations, a forward transformation get

and a backward transformationput as in Figure 1. The forward transforma-tionget is used to produce from a sourcesto a viewt, while the backward transformationput is to reflect modification on the view back to the source. Practical examples of bidirectional transformation include synchronization of replicated data in different formats (Foster et al. 2005), presentation-oriented structured document development (Hu et al. 2008), interactive user interface design (Meertens 1998), coupled software transformation (L¨ammel 2004), and the well-known view updating mechanism which has been inten-sively studied in the database community (Bancilhon and Spyratos 1981; Dayal and Bernstein 1982; Gottlob et al. 1988; Hegner 1990; Lechtenb¨orger and Vossen 2003).

Despite many promising results on bidirectional transformation, the data that can be dealt with are limited to linear strings or tree-like data struc-tures. It remains as an open problem (Czarnecki et al. 2009) whether it is possible to design a language that can support practical development of bidirectional transformations on graphs which contains node sharing and cycles. It would be remarkably useful in many applications if bidirectional transformation can be applied to graph data structures, because graphs play an irreplaceable role in representing more complex data structures such as models (e.g., UML diagrams) in model-driven software develop-ment (Stevens 2007), and Object Exchange Model (OEM) for exchanging arbitrary database structures (Papakonstantinou et al. 1995).

(5)

computation), because na¨ıve computation on graphs would visit the same nodes many times and possibly infinitely often.

In this paper, we give the first language-based (linguistic) solution to the problem of bidirectional graph transformation. We approach this problem by providing a bidirectional semantics for an existing graph query language UnQL (Buneman et al. 2000). We choose UnQL as the basis of our bidirectional graph transformation for the following two reasons.

• First, UnQL is a graph querying language that has been intensively studied in the database community with solid foundation and efficient implementation based on graph algebra. It has a concise and powerful surface syntax based on select-where clauses like SQL, and can be easily used to describe many interesting graph transformations.

• Second, and more importantly, graph transformations in UnQL are

structured in the sense that any transformation can be expressed in terms structural recursion, which can be evaluated in abulkway (Bune-man et al. 2000); a structural recursion is evaluated by first processing

in parallelon all edges of the input graph and then combining the re-sults. This feature significantly contributes to our bidirectionalization because it helps us to trace a corresponding source from its result.

Our technical contributions are summarized as follows.

• We are, as far as we are aware, the first who recognize the importance of structural recursion and its bulk semantics in addressing the challeng-ing problem of bidirectional graph transformation, and succeeded in a general graph transformation framework based on structural recur-sion. We show that graph transformations described with structured recursion are not only suitable for optimization as intensively stud-ied so far (Buneman et al. 2000), but also make backward evaluation easier.

• We give a formal definition of bidirectional semantics for structural recursion (Section 4), by (1) refining the existing forward evaluation so that it can produce useful trace information for later backward eval-uation, and (2) extending the bulk semantics of structural recursion from forward evaluation to backward evaluation. And we prove that our bidirectional semantics is well-behaved. The success of bidirection-alizing non-trivial UnQL shows practical power of our approach.

(6)

(a) A Simple Graph (b) An Equiva-lent Graph

1

2

a

3

b

4

c

(c)

1

2

a

3

b

4

c

(d) (≁2(c))

Figure 2: Graph Equivalence Based on Bisimulation

The rest of this paper is organized as follows. We start with a brief overview of the basic idea of graph data model and graph query language UnQL in Section 2, before we show how an UnQL query program can be automatically transformed to structural recursion in its underlying graph algebra UnCAL in Section 3. Then we provide bidirectional semantics for UnCAL and prove that the bidirectional semantics is well-behaved in section 4. Section 5 discusses our implementation and experimental results, Section 6 summarizes related work, and Section 7 concludes the paper.

2 Graph Transformation in UnQL

We start with a brief overview of the graph data model and unidirectional graph transformation in UnQL (Buneman et al. 2000), a graph query language to be bidirectionalized in this paper.

2.1 Graph Data Model

External Graph Representation Graphs in UnQL are rooted and

di-rected cyclic graphs with no order between outgoing edges. They are edge-labelled in the sense that all information is stored as labels on edges and the labels on nodes serve as a unique identifier and have no particular meaning. Figure 2(a) gives a small example of a directed cyclic graph with six nodes and seven edges. In text, it is represented by

G = {a:{a:G1},b:{a:G1},c:G2}

G1 = {d:{}}

(7)

where the notation {l1:G1, . . . , ln :Gn} denotes a set representing a graph which containsnedges with labelsl1, . . . , ln, each li pointing to a graphGi, and the empty set {} denotes a graph with a single node. Two graphs G1 andG2 can be merged using set union operationG1∪G2. This graph model is set-based in the sense that{a:{},a:{}}and{a:{}}represent the same graph. In addition, theε-edge is allowed to represent shortcut of two nodes, and works likeε-transition in automaton. For instance, we have

{ε:{a:G1}, ε:{b:G2}}={a:G1,b:G2}.

Internal Graph Representation While the external graph

representa-tion suffices for users to consider when writing graph transformarepresenta-tion, the internal graph representation is designed for internal implementation and semantics description as in Section 4. Different from the external represen-tation, nodes in the internal representation may be marked withinput and

output marker, which are used as an interface for composition of graphs. Now we describe the formal definition and notations of graphs. These notations will be used to specify the bidirectional semantics of UnCAL in Section 4. Let Label, X and Y be a (possibly infinite) set of labels, a set of input markers, and a set of output markers, respectively. A graph G is denoted by a quadruple (V, E, I, O), where V is a set of nodes, E⊆V×Labelǫ×V withLabelǫ=Label∪{ǫ}is a set of edges,I ⊆ X ×V is a set of pairs of input markers and corresponding input node, andO⊆V × Y

is a set of pairs of output nodes and associated output marker. For each marker&x∈ X, there is at most one nodev such that (&x, v)∈I. The node v is called input node with marker &x and denoted by I(&x). In contrast, more than one node can be marked with an identical output marker. They are calledoutput nodes.

Intuitively, input nodes can be regarded as root nodes of the graph. In other words, although the external representation limits a graph to be singly rooted, internally we deal with multiple roots. For singly rooted graphs, we implicitly use the default marker &to indicate its single root. An output node can be regarded as a “context-hole” of graphs where an input node with the same marker is plugged later. Indeed, the “vertical graph composition” operatorG1@G2 is defined to plug the input nodes of

G2 into their corresponding output nodes inG1.

For example, in internal representation, the simple graph in Figure 2(a) is denoted by (V, E, I, O) where V = {1,2,3,4,5,6}, E = {(1,a,2),(1,b,3),(1,c,4),(2,a,5),(3,a,5),(4,c,4), (5,d,6)}, I = {(&,1)}, and O = {}. This graph is not really typical, as it has no ε-edge, only one input node, and no output node. We will see more examples later.

Graph Equivalence Graph bisimulation defines value equivalence

(8)

if they are equal when viewed as sets of paths from the root. Informally, we say that there is a simulation from graphG1to graphG2 if every nodex1 in

G1 has a counterpart x2 inG2, and if there is an edge form x1 to y1 inG1, then there is a corresponding edge fromx2 toy2 inG2 that is a counterpart of y1. In UnQL data model, graph equivalence of two graphs requires the correspondence of input and output markers between them as well.

For instance, the graph in Figure 2(b) is value equivalent to the graph in Figure 2(a); the new graph has an additionalε-edge (denoted by the dotted line), duplicates the graph rooted at node 5, and unfolds and splits the cycle at node 4.

Note also that sets of paths from the root do not always represent value equivalence. The graph in Figure 2(c) is not value equivalent to the graph in Figure 2(d) although they are represented by an identical set of paths

{a.b, a.c} from the root.

As a remark, the notion of bisimulation is useful because it allows varia-tion in representing semantically equivalent graphs. It has been shown that a graph transformation defined in UnQL preserves bisimilarity (Buneman et al. 2000). If two graphs G1 and G2 are bisimilar, f(G1) and f(G2) are bisimilar for any transformation f in UnQL.

2.2 UnQL

UnQL has a convenient select-where construct like SQL to extract infor-mation from a graph and building a new graph with the inforinfor-mation. Figure 3 shows an abstract syntax of UnQL. We omit the detailed explanation of the language syntax, which can be found in (Buneman et al. 2000). Rather we illustrate the important features through some examples. Variables in UnQL are prefixed with $ in this paper. For all examples below, we assume a variable $db is bound to the graph in Figure 2(a).

2.2.1 The select-where Construct

A query of the form selectTwhereB1, . . . , Bn extracts information from graphs based on the binding and/or Boolean conditions B1, . . . , Bn and constructs a graph according to theT.

Example 1. The following query returns subgraphs that are pointed by b

from the root of $db.

select $G where {b: $G} in $db

This query first matches the graph pattern {b: $G} against the graph $db

(9)

(query) Q ::= select T where B, . . . , B (template) T ::= {L:T, . . . , L:T} | $G | Q

| T∪T | f($G)

| ifL=LthenT elseT

| let sfunf {Lp:Gp} =T

| f {Lp:Gp} =T

. . .

sfunf′ {Lp:Gp} =T

| f′ {Lp:Gp} =T

. . . . . . in T

(condition) B ::= Gp in $G | L=L | L=L

(label) L ::= $l | a

(label pattern) Lp ::= $l | Rp

(graph pattern) Gp ::= $G | {Lp:Gp, . . . , Lp:Gp}

(regular Rp ::= a | | Rp.Rp

path pattern) | (Rp|Rp) | Rp? | Rp∗

Figure 3: Syntax of UnQL

(a)Result of Example 1

(b) Result of Example 2

(c) Result of Example 3

(10)

Example 2. The following query has multiple conditions in thewherepart and construction of graphs in theselect part.

select $G1∪$G2 where {b: $G1} in $db,

{a: $G2} in $db

It returns all subgraphs that are pointed by either b or a from the root. Figure 4(b) shows the result of this query.

2.2.2 Regular Path Patterns

Regular path patterns, similar to XPath, are provided for concisely express-ing “deep” queries against a graph.

Example 3. Consider the following query.

select {result: $G1}

where { ∗.(a|b) : $G1} in $db,

{$l : $G2} in $G1,

$l =a

It extracts all subgraphs $G1 according to the regular path ∗.(a|b) (i.e., any path ended with an edge labelled a or b), keeps those subgraphs that do not contain an edge ofafrom their root, and glues the results with new edges labelled result. It returns all subgraphs that are pointed by eitherb

orafrom the root. Figure 4(c) shows the result of this query.

2.2.3 Structural Recursion

Structural recursion is powerful to define various functions to manipulate graphs. A structural recursion f on graphs is a very simple recursive computation scheme on graphs defined by

f {} = {}

f {$l : $G} = t($l,$G)⊙f($G) f ($G1∪$G2) = f($G1)∪f($G2)

where⊙is a given binary operator and the termt($l,$G) does not contain recursive call tof. Different choice of⊙defines different function. Function f is homomorphic in the sense that application result of a union of two graphs coincides with a union of application result of them. Since the first and the third equations are the same for any definition, we may omit them and simplify the above definition as:

sfun f {$l : $G}=t($l,$G)⊙f($G).

(11)

manipulates each edge of the graph in parallel. Note also that structural recursion has a syntactic restriction that no return value of a function should be fed to another function as its input. For instance, a function definition

sfun f {$l : $G}=t($l,$G)⊙f(g($G))

is invalid because g($G) is fed to functionf.

This restriction guarantees that the recursion always terminates.

Example 4. As a simple example, we use the following structural recursion to replace all labelsabydand delete the edges labelledcfor an input graph.

sfun a2d xc {$l : $G}=if$l =athen {d:a2d xc($G)}

else if$l =cthen a2d xc($G) else {$l :a2d xc($G)}

A natural extension of the above structural recursion is to allow mutual recursion as shown in Figure 3. For example, a mutual recursive definition

sfun h{b: $G} ={b:a2e($G)}

| h{$l : $G} =h($G)

sfun a2e {a: $G} = {e:a2e($G)}

| a2e {$l : $G} ={$l :a2e($G)}

defines a functionhwhich erases all edges until it reaches a b. After that it copies the graph, but replace everyawith an e.

2.3 A Practical Example: Customer2Order

As a more practical example, we consider a transformation from customers’ graph to orders’ graph, which is adapted from a similar example in the textbook on model-driven software development (Pastor and Molina 2007). It will serve as one of the running examples of this paper.

Figure 5 gives a simple graph representing customers’ information. Re-member that all information should be stored on labels of edges in our graph model. All numbers in nodes have no particular meaning. The graph has a root pointing to two customers, each having a name, some email addresses, several addresses of different types (e.g. shipping or contractual customer address). A customer can have many customer orders.

(12)

1 0 1003 3 2 "16/12/2008" 5 4 1002 7 6 "16/10/2008" 9 8 1001 11 10 "16/07/2008" 13 12 "IPL of Tokyo"

15 14 "100-888" 17 16 contractual 19 18

"BiG office of Tokyo" 21 20 "200-777" 23 22 shipping 24 no date 25

info code type 27 26 "kato@biglab" 29 28 "Kato" 30 no date 40 order_of order 31 order 32 add 34 email 36 email 38 name no date order_of

info codetype

33 "tanaka@gmail" 35 "tanaka@biglab" 37 "Tanaka" 39

order order_of add email name add 41

customer customer

Figure 5: Cyclic Graph gcus Representing Customer-Centric Database

1 0 1003 3 2 "16/12/2008" 5 4 1002 7 6 "16/10/2008" 9 8 1001 11 10 "16/07/2008" 13 12

"BiG office of Tokyo" 15 14 "200-777" 17 16 Kato 19 18 Tanaka 20 22 order 24 order 26 order

no date customer_name

21

ship customer_nameno date

23

ship no date customer_name

25 ship

info info codecode info code

Figure 6: Graph gord Representing Order-Centric Database

expressed in UnQL as follows, and generates the graph in Figure 6.

select {order: {date: $date,

no: $no,

customer name: $name,

addr: $a}}

where {customer.order: $o} in $customer db,

{order of: $c, date: $date, no: $no} in $o,

{add: $a, name: $name} in $c,

{type:shipping} in $a

3 UnCAL: An Internal Graph Algebra

(13)

T ::={} (* one-node graph *)

| {L:T} (* graph with a single edge from root *)

| T ∪T (* union of two graphs *)

| &x:=T (* label the root node with input marker x *)

| &y (* graph with output marker y *)

| () (* empty graph *)

| T ⊕T (* disjoint union *)

| T @T (* append of two graphs *)

| cycle(T) (* graph with cycles *)

| $G (* variable reference *)

| ifL=LthenT elseT (* conditional *)

| rec(λ($l,$G).T)(T) (* application of structural recursion *)

L ::= $l (* label variable reference *)

| a (* label (a∈Label) *)

Figure 7: Core UnCAL Language

language UnQL. An UnQL query program written by users is translated into well-defined UnCAL expressions. In this section, after explaining how to represent structural recursion in UnCAL with new graph constructors, we show that any UnQL program can be automatically mapped to structural recursions in UnCAL.

3.1 Structural Recursion in UnCAL

The core of the UnCAL language is summarized in Figure 7. In addition to the graph constructors, UnCAL provides structural recursion, a general way to manipulate graphs.

Graph Constructors There are nine data constructors which are used

to build arbitrary graphs (Buneman et al. 2000).

• {}(one-node graph): it constructs a graph with a single node without edges.

• {l : G} (one-edge-connected graph): it constructs a graph with the root pointing to the root of the graphGthrough the edgel.

• G1∪G2 (union of graphs): it unifies two graphs by creating a new root and connect it to the roots ofG1 andG2 usingε-edges.

• &x:=G(graph with input marker): it adds some input marker to the root ofG.

(14)

• () (empty graph): it constructs an empty graph which has neither node nor edge.

• G1 ⊕G2 (disjoint union of graphs): it constructs a graph by compo-nentwise union.

• G1@G2 (append of graphs): it appends two graphs by connecting the output nodes ofG1 with corresponding input nodes ofG2withε-edges.

• cycle(G) (cyclic graph): it connects the input nodes with the output nodes ofG to form cycles.

The formal definition of the semantics of these constructors can be found in Section 4. It is worth noting that this set of constructors are powerful enough to describe any unordered graphs.

Example 5. The graph in Figure 2(a) can be constructed as follows (though not uniquely).

&z@cycle(&z:={a:{a:&z1}} ∪ {b:{a:&z1}} ∪ {c:&z2}

⊕(&z1 :={d:{}})

⊕(&z2 :={c:&z2}))

We will use the following two abbreviations: {l1 : G1, . . . , ln : Gn} for

{l1 : G1,} ∪. . . ∪ {ln : Gn} and (t1, . . . , tn) for t1 ⊕. . . ,⊕tn. Thus, the above UnCAL program becomes

&z@cycle(&z:={a:{a:&z1},b:{a:&z1},c:&z2},

&z1 :={d:{}},

&z2 :={c:&z2}).

It is worth remarking that not all these constructors are required to transform UnQL queries to UnCAL terms. In fact, the constructorcycle is not required.

Structural recursion in UnCAL Any structural recursion in UnQL

let sfun f {$l : $G}=t⊙f($G) in f(t′₎

can be described byrecas

&z1@ (rec(λ($l,$G).(&z1:=t⊙&z1))(t′).

(15)

(a) Before removingε-edges (b) After removingε-edges

Figure 8: Bulk Semantics in UnCAL

Example 6. The UnQL structural recursion a2d xc given in Example 4 above is represented by

&z1@(rec(λ($l,$G′).if$l =athen(&z1 :={d:&z1}) else if$l =cthen(&z1 :={ε:&z1})

else(&z1 :={$l :&z1}))($G))

For mutually defined functions, we can merge them into one rec con-struct by the tupling transformation (Hu et al. 1997).

3.2 Bulk Semantics of Structural Recursion

Structural recursion has two equivalent semantics under the graph bisimu-lation: bulk semantics and recursive semantics. The former is the promi-nent feature of structural recursion, whereas the latter is the usual re-cursive semantics with memoization. Informally, the bulk semantics for rec(λ($l,$G).t)(G′) is thatλ($l,$G).t is applied independently on all edges of graph G′_{, then the results are joined together with} _ε_{-edges (as in the} _@

operation).

Example 7. Consider to apply the structural recursive functiona2d xcto the graph in Figure 2(a). Applying the function to each edge from i to j gives a subgraph containing a graph with an edge fromSijtoEij(where the dotted edge denotes anε-edge), then marking the root with an input marker, and finally joining the nodes with ε-edges according to the original graph shape and input/output markers yields the graph in Figure 8(a), which is equivalent to the graph in Figure 8(b) if we remove allε-edges.

3.3 From UnQL to UnCAL

(16)

this mapping transformation, and then clarify the property of the obtained UnCAL programs that will serve as the target of our bidirectionalization in Section 4.

As we have shown that structural recursion in UnQL can be mapped to that in UnCAL before, to show that every UnQL expression can be mapped to structural recursion in UnCAL, it is sufficient to show that the select-where expression can be expressed in terms of structural recursion (in UnQL). This can be achieved by three steps. First, we can unnest patterns in the where-clause such that each pattern is in the simple form of

{Lp: $G}in$G with the following three rules.

where {Lp1 :Gp1, ..., Lpn:Gpn}in$G

−→where {Lp1:Gp1}in$G, . . . ,{Lpn:Gpn}in$G where {Lp:a}in$G

−→where {Lp: $G′}in$G,{a: $G′′}in$G′

where {Lp:Gp}in$G

−→where {Lp: $G′_}_in_$_G_{, Gp}_in_$_G′

Note that $G′ and $G′′ are fresh variable names in the above rules.

Then, an UnQL query with simple patterns (not regular path patterns) can be translated into structural recursions with condition in UnQL.

select T where −→T

select T where {Lp: $G1}in$G2,rest

−→let sfunf {Lp : $G1}=select T where rest in f ($G2)

select T whereL1 =L2,rest

−→ifL1 =L2 then select T where rest else{}

When the patterns are regular path patterns, they are translated into structural recursions. The idea of translation is to express a regular path pat-tern as an NFA (Non-deterministic Finite Automaton), associate a function to each state, and produce a mutually defined structural recursion according to the transition of the NFA.

As a simple example, consider the regular path pattern _*.a.b, an equivalent non-deterministic automaton1 has five states and the following transitions :

s1

Any

−−−→s4, s1

Any

−−−→s5, s1−→a s3, s3→−b s2, s4−→a s3,

s5

Any

−−−→s4, s5

Any

−−−→s5, s5−→a s3

(17)

The initial state iss1 and the terminal state iss2. So, the following mutual structural recursion can be defined.

sfun f1{a: $G} =f3($G)

|f1{$l : $G} =f4($G)∪f5($G)

sfun f2{$l : $G} ={}

sfun f3{b: $G} =f2($G)∪$G

|f3{$l : $G} ={}

|f4{$l : $G} ={}

|f5{$l : $G} =f4($G)∪f5($G)

In general mutually defined recursive functions are translated into a single recursive function(Hu et al. 1997). For UnQL/UnCAL, the new single recursive function is defined using markers. Each marker corresponds to a function that is mutually defined with others. Output markers are used instead of recursive calls in the body of the function. This new function returns a tuple of results of each function, represented by markers, that is mutually defined with others.

For example, consider the following mutual recursive definition shown in Section 2.2;

sfun h{b: $G} ={b:a2e($G)}

| h{$l : $G} =h($G)

sfun a2e {a: $G} = {e:a2e($G)}

| a2e {$l : $G} ={$l :a2e($G)}

This function is translated into single recursive function as follows;

sfun f {$l : $G}=g($l,$G) @ f($G)

whereg($l,$G) is defined as follows;

g(a,$G) = ((&z1 :=&z1) ⊕(&z2 :={e:&z2}))

g(b,$G) = ((&z1 :={b:&z2}) ⊕(&z2 :={b:&z2}))

g($l,$G) = ((&z1 :=&z1) ⊕(&z2 :={$l :&z2}))

Note that markers &z1 and &z2 correspond to functions h and a2e, respec-tively. Recall that an output node vcan be regarded as a “context-hole” of grapshs and append of graphs (t1@t2) plugs the input nodes int2 into their corresponding output nodes in t1. Recall also that disjoint union (t1⊕t2) performs componentwize union oft1 and t2.

As shown in Section 3.1, the above form of single structural recursive functionf is descrived byrec as

(18)

Note that append operator @used in the form usingsfunis dropped.

Lemma 1(Target UnCAL). The mapping algorithm from UnQL to UnCAL produces an UnCAL expression with the following syntactic properties.

• For recursion application rec(λ($l,$G).t)(t′), the argument t′ should be a variable, which implies no intermediate graphs.

• For disjoint union(t1⊕t2),t1 andt2 should be in the form of &x:=t′.

• For append (t1 @ t2), the left operand should be an output marker

and the right operand should be an application of structural recursion

(&y@rec(λ($l,$G).T)(T)).

• Nocycle(t) expression should appear.

4 Bidirectionalization of UnCAL

In this section, we show that an UnCAL program can not only be evaluated forwardly as usual, but also be evaluated backwardly to reflect updates from the result to the source. We shall give a bidirectional semantics for each construct of UnCAL. Note that the backward semantics mentioned first can cope with general in-place updating and deletion, but not insertion. We extend the semantics ofif andrecto cope with insertion in Section 4.5, and finally prove well-behavedness of our bidirectional semantics.

4.1 Bidirectional Properties

Forward evaluation (often called get in the literature) of an UnCAL term t computes a result graph G = [[t]]ρ (view), under a variable binding envi-ronmentρ (input) which is a mapping from variables to graphs. Backward evaluation (or put) goes backward. Given an original input environment ρ and a possibly modified view graphG′_{, it computes an updated environment}

ρ′ =tρ_G′.

The following two important properties define the well-behavedness of a pair of forward and backward evaluations, which are essentially the same as those in lenses(Foster et al. 2005).

[[t]]ρ=G implies tρ_G =ρ (GetPut)

tρ_G′ =ρ

′ _implies _[[_t_]]ρ′

=G′ (PutGet)

The (GetPut) property says that no change on the view graph should give no change on the environment, while the (PutGet) property says that the backward computation computes a new environment ρ′ from G′ in such a way that applying the forward computation underρ′ _{again should give the}

(19)

4.2 Embedding Trace Information in Structured Node IDs

Different from unidirectional computation, the forward evaluation in the context of bidirectional computation should keep trace information for later backward evaluation. Our forward evaluation rules will refine (extend) the original semantics of UnCAL. Basically, each graph constructor expression is straightforwardly evaluated to the graph as it denotes. However, not only constructing the output graph structure, we also embed some “trace” information in each node of the output graph to guarantee the correct backward evaluation. The nodes of the output graph are identified by what we call theStructured IDs that describe where the nodes came from.

Consider the upper part of Figure 9, which demonstrates evaluation of G1 ∪G2 (a union of two graphs). For later backward evaluation, we need to decompose the result graph into two while keeping the original correspondence. And this is difficult because our graphs are unordered. Our idea is to assign structured IDs to the nodes of the output graph so that together with information of the original inputsG1 andG2the output graph can be decomposed.

Formally, the structured ID is defined as follows.

StructuredID ::=OriginalID

| InC CodePos

| InC∪ CodePos Marker

| HubCodePos StructuredID Marker

| FrECodePos StructuredID Edge

where Edge =StructuredID ×Label ×StructuredID, CodePos is the set of position identifiers that are uniquely assigned to all syntactic nodes of the UnCAL program, and OriginalID is the set of identifiers that are uniquely assigned to all nodes of the input graph. All the structured IDs are to denote the output nodes2 _{of UnCAL transformations. The ID (}_InC _p_{) denotes an} output node created from a constant expression like {} or {a : {}}. The value p is the unique identifier assigned to each syntactic node of UnCAL expressions. For example, there are two syntactic nodes in the constant expression {a : {}}: the whole expression itself and the subexpression

{}. They are assumed to have distinct position identifiers, say, p1 and p2 respectively, and the corresponding output nodes are named distinctly as (InC p1) and (InC p2). The ID (InC∪ p &m) is similar to (InC p) but also

indexed with the marker &m. This type of ID is used for representing an output graph of an union-expression, as explained later. Last but not least, the IDs of the from (FrEp i e) and (Hubp v m) are used for constructing the output nodes of the structural recursion:rec. Recall that the bulk semantics of the structural recursion first evaluates the recursion body at every edgee

2_{Input/output here denotes input/output of transformations. We say “input/output}

(20)

of the input graph, and then joins the results at the point of input/output markers. Now, suppose the evaluation of the body expression at the edgee generated an output node with ID i. We augment such output node with the ID (FrEp i e) where p is the position of the rec expression itself. The ID (Hubp v&m) denotes the output node used for joining the results of bulk evaluation of structural recursion at the position p.

It is worth noting that our assignment of structured IDs makes an ID independent of actual evaluation order. In this sense, the ID assignment strategy isfunctional and side-effect free. This fact helps to simplify the our bidirectional semantics.

4.3 Issues on Backward Evaluation

Similarly to the usual put, the backward semantics ρ′ ₌_tρ

G′ requires the original input environment ρ as well as the modified target G′_{. The}

origi-nal input is used for decomposing the target so as to define the backward semantics inductively. For example, to computet1∪t2ρ_G′, we first decom-poseG′ to two parts G′1 and G′2 and then inductively computeρ′1=t1ρ_G′

1

and ρ′₂ =t2ρ_G′

2. For this decomposition, we need the original, unmodified

version ofG′₁andG′₂, i.e.,G1 = [[t1]]ρandG2 = [[t2]]ρ. This is the first reason why we take the originalρ as the input here.

Another reason is to use the original environment for merging the up-dated environments. In general, multiple environment produced by backward evaluation are merged by⊎ρ, with original environment ρconsidered. Con-flicts are resolved in this operator as follows: if no difference between the original environment and the result of backward evaluations is detected, it simply returns the original environment. Otherwise, the result of backward evaluation takes precedence, provided that (possibly multiple) result(s) are consistent with the original environment. It fails if none of the condition hold.

4.4 Formal Definition of Bidirectional Semantics

In this section, we give formal definition of bidirectional semantics for UnCAL, where only in-place updating and deletion is considered. We will extend the bidirectional semantics so that insertion can be coped with in Section 4.5.

4.4.1 Bidirectional Evaluation of Simple Expressions

(21)

modify

ε ε

&

∪

a

c d c b

& G1

b c

&

a

c _d

ε ε

&

b c a

c _d

x

b c a c d

x

& &

G1

G′2

G′1

decomp

G

G′

Figure 9: Example for Bidirectional Computation of Union

to theε-edges and the structured IDs. Then we show that this bidirectional evaluation of graph constructors can be extended to bidirectional evalua-tion of UnCAL expressions except for the structural recursions. Finally, we tackle the problem of bidirectional semantics for structural recursions. In the following definitions,pdenotes position identifier.

Nullary constructors {} (single node graph), &y (a node with output marker) and () (empty graph) construct constant graphs in their forward computation. For backward computation, they are constant and hence accept no modification on the result view.

[[{}]]ρ = ({InC p},∅,{(&,InC p)},∅)

{}ρ_G′ = ρ ifG′= [[{}]]

ρ

[[&m]]ρ = ({InC p},∅,{(&,InC p)},{InC p,&m})

&mρ_G′ = ρ ifG′= [[&m]] ρ

[[()]]ρ = (∅,∅,∅,∅)

()ρ_G′ = ρ ifG′= [[()]] ρ

(22)

on the graph is reflected to the other operandG2 (asG′2).

[[{t1 :t2}]]ρ =

({InC p} ∪V2, E1∪E2,{(&,InC p)}, O2) where E1 = {(InC p, l1, I2(&))}

(V2, E2, I2, O2) = [[t2]]ρ

l1 = [[t1]]ρ

{t1:t2}ρ_G′ =t1

ρ l′

1 ⊎ρt2

ρ G′

2

where l1 = [[t1]]ρ

G2 = [[t2]]ρ (l′

1, G′2) =decomp{l1:G2}(G ′₎

Here, the decomposition function is defined as follows:

decomp_{_l₁_:_G₂_}(G′) =

(l′₁,(V′\ {r′}, E′\ {e′},{(&, v)}, O′)) where (V2, E2,{(&, v)}, O2) =G2

(V′, E′,{(&, r′)}, O′) =G′ e′ = the unique edge in E′

of the form (r′_{, l}′

1, v).

The modified output graphG′ is decomposed into its unique root edge e′ _{= (}_r′_{, l}′

1, v) and the rest of the graph rooted at v. If G′ have more than one edges from the root node or the new rootv does not match the root node of the original resultG2, the backward evaluation fails.

∪ unifies two graphs by connecting input nodes in two graphs with matching markers usingε-edges in forward computation. Its backward computa-tion removes these edges and restores the operand graphs while taking modifications to them into account. Consider an example in Figure 9. The edge labeledx (bold arrow) is added on the view. Backward com-putation removes these ε-edges to restore original input nodes, and compute modified operands by collecting reachable parts from each of the input nodes3. The added edge is reachable from the input node of G′

2, so the modification belongs to the second operand. For well-behavedness, backward computation checks if ε-edges from the input nodes are changed, and fails if that is the case. Note that the forward computation could have been glued the two input nodes together, but doing so would make splitting of the view difficult, since both of the

3_{Reachable parts from a given node can be computed by traversing edges from that}

(23)

operands can be reachable from the glued node.

[[t1∪t2]]ρ = (V, Eu∪E1∪E2, Iu, O1∪O2) whereV =Vu∪V1∪V2

(V1, E1, I1, O1) = [[t1]]ρ (V2, E2, I2, O2) = [[t2]]ρ

M = inMarkers(t1) =inMarkers(t2)

Vu = {InC∪ p &m|&m∈M}

Eu = {(InC∪ p &m, ε, v)|(&m, v)∈I1}

∪ {(InC∪ p &m, ε, v)|(&m, v)∈I2}

Iu = {(&m,InC∪ p &m)|&m∈M}

t1∪t2ρ_G′ =t1

ρ G′

1 ⊎ρt2

ρ G′

2

where G1 = [[t1]]ρ

G2 = [[t2]]ρ (G′

1, G′2) =decompG1∪G2(G ′₎

The decomposition function ensures that the root node of the modified graph G′ is an origin of a bunch of ε-edges, whose destination node came from the root node of either the original G1 or theG2.G1\G2 denotes componentwise set difference.

decomp_G₁_∪_G₂(G′) =

xreachable(G′

1, G1),xreachable(G′2, G2) where (V′, E′, I′, O′) = G′

(Vi, Ei, Ii, Oi) = Gi G′

i =reachable((V′, E′, Ii, O′)) satisfying

inMarkers(G1) =inMarkers(G2)

∀&m∈inMarkers(G1),(&m, r′)∈I′, (r′, ε, v′)∈E′ :

(&m, v′₎_∈_I

1∪I2

xreachable(G′_{, G}_{) =}

(Vr′_∪_Vu_{, E}r′_∪_Eu_{, I}r′_∪_Iu_{, O}r′_∪_Ou₎ where (Vr′_{, E}r′_{, I}r′_{, O}r′_{) =} _reachable₍_G′₎

(24)

⊕ performs componentwise union in its forward computation. For backward computation, it is like ∪, except that noε-edge is involved.

[[t1⊕t2]]ρ= (V1∪V2, E1∪E2, I1∪I2, O1∪O2) where (V1, E1, I1, O1) = [[t1]]ρ

(V2, E2, I2, O2) = [[t2]]ρ

t1⊕t2ρ_G′ =t1

ρ G′

1 ⊎ρt2

ρ G′

2

whereGi = [[ti]]ρ

(G′₁, G′₂) =decomp_G₁_⊕_G₂(G′)

decomp_G₁_⊕_G₂(G′_{) =} _decomp

G1∪G2(G ′₎

withoutsatisfying

condition

@ appends two graphs by connecting the output nodes of the left operand with corresponding input nodes of the right operand withǫ-edges in its forward computation. Currently, we allow this operator to be used only for the projection of one component of the result of structural recursion by &zi@rec(tb)(ta). Note that @ may introduce unreachable part in the second argument, because of unmatched I/O nodes. Backward computation carefully passes those parts backwards untouched to avoid unnecessary failure because of inconsistency because these parts are part of ordinary computation (computation on reachable parts) before discarding by the @ operator. Note that users operating on the view do not change these unreachable parts since these parts are invisible for the users.

[[&m@t2]]ρ = (V, E@∪E2,{(&,InC p)}, O2) whereV ={InC p} ∪V2

V1={InC p}

(V2, E2, I2, O2) = [[t2]]ρ

E@ ={(InC p, ε, v)|(&m, v)∈I2}

&m@t2ρ_G′ =t2 ρ G′

2

where (V′, E′, I′, O′) =G′ (V2, E2, I2, O2) = [[t2]]ρ

E@ ={(InC p, ε, v)|(&m, v)∈I2}

G′₂ = (V′\ {InC p}, E′\E@, I2, O2)

(:=) distributes the marker on the left operand to each of the input marker of the graph in the right operand, using the Skolem function “.” that satisfies (&x.&y).&z = &x.(&y.&z) (associativity) and &.&x = &x.& =

(25)

of the input markers inG′ at the front.

[[&m:=t1]]ρ = (V1, E1, I, O1) where (V1, E1, I1, O1) = [[t1]]ρ

I ={(&m.&x, v)|(&x, v)∈I1}

&m:=t1ρ_G′ =t1 ρ G′

1

whereG′

1 = (V′, E′, I1′, O′) (V′, E′, I′, O′) =G′

I₁′ ={(&x, v)|(&m.&x, v)∈I′}

($v) looks up variable binding from environmentρ. Backward computation updates the binding.

[[$v]]ρ = ρ($v)

$vρ_G′ = ρ[$v←G′]

(if) first evaluates the conditional expressionband by the result it evaluates either thethenbranch or theelsebranch in the forward computation. For the backward computation, modification to the environment as a result of modification to the view may change the branching behavior (result ofb). In order to make sure that well-behavedness still holds, backward semantics chooses the branch in which a result of another forward computation on the conditionbagrees. To cope with possible non-determinism where both branches agree, the branch taken in the forward computation takes precedence. If neither of the branches agree, the backward computation fails.

[[ifbthent1 elset2]]ρ = [[ti]]ρ

where i= (if [[b]]ρ=truethen 1 else 2)

ifbthent1 elset2ρ_G′ =ρ′_i′

where ρ′

1 =t1ρ_G′ ρ′

2 =t2ρ_G′

i = (if [[b]]ρ=truethen 1 else 2)

i′ ₌

i if [[b]]ρ′i _{= [[}_b_]]ρ 3−i if [[b]]ρ′3−i _{= [[}_b_]]ρ

(26)

fwd eachedge(ρ, tb, Ga=( , Ea, , ))

= 

(e,[[tb]]ρe) ˛ ˛ ˛ ˛

e∈Ea, label(e)=ε,

ρe=ρ{$l→label(e),$g→subgraph(Ga, e)} ﬀ

composerec(G,(Va, Ea, Ia, Oa), M) = (Vbody∪Vhub, Ebody∪Espoke∪Eeps, I, O) whereVbody ={FrEpe v|(e,(Ve, , , ))∈ G, v∈Ve}

Ebody ={(FrEpe u, l,FrEpe v)|(e,( , Ee, , ))∈ G,(u, l, v)∈Ee}

Vhub ={Hubpv&m|v∈Va,&m∈M} Espoke ={(Hubpv&m, ε,FrEpe u)|

&m∈M,(e= (v, , ),( , , Ie, ))∈ G,(&m, u)∈Ie}

∪ {(FrEpe u, ε,Hubpv&m)|

&m∈M,(e= ( , , v),( , , , Oe))∈ G,(u,&m)∈Oe}

Eeps ={(Hubpv&m, ε,Hubp u&m)|(v, ε, u)∈Ea,&m∈M} I ={(&n.&m,Hubpv&m)|(&n, v)∈Ia,&m∈M}

O ={(Hubpv&m,&n.&m)|(v,&n)∈Oa,&m∈M}

Figure 10: Core of the Forward Semantics of recat Code Position p

Property (1) is required for the consistent decomposition of targets. Property (3) is required to avoid unnecessary failure of backward computation by interference of unreachable parts on reachable parts. They are also natural from the user’s point of view since these components (ε-edge, markers and unreachable parts) are all invisible to users.

4.4.2 Bidirectional Evaluation of Structural Recursion

In this section, we give bidirectional semantics for structural recursionrec. For forward computation,recexpressions are interpreted inbulk semantics

as follows:

[[rec(λ($l,$g). tb)(ta)]]ρ=

composerec(fwd eachedge(ρ, tb, Ga), Ga, M) where M =inMarker(tb)∪outMarker(tb)

Ga = [[ta]]ρ

where fwd eachedge and composerec is defined in Figure 10. Intuitively,

fwd eachedge evaluates the expression tb at each edge e of the argument graph Ga and returns the set of result graphs. Then, composerec glues all the graphs together along the structure of the inputGa.

(27)

bwd eachedge(ρ, tb,G′)

=n(e,ebρ e G′

e )˛˛

˛(e, G

′

e)∈ G′, ρe=ρ{$l→label(e),$g→subgraph(e)} o

decomprec(ρ, tb,(V

′_{, E}′_{, I}′_{, O}′₎_{, G}

a, M)

= 8 > > > > > > > > < > > > > > > > > :

(e, G′

e) ˛ ˛ ˛ ˛ ˛ ˛ ˛ ˛ ˛ ˛ ˛ ˛ ˛ ˛

e∈Ea,label(e)=ε,

ρe=ρ{$l→label(e),$g→subgraph(Ga, e)}

V′

e ={w|(FrEpe w)∈V′},

E′

e ={(w1, l, w2)|(FrEpe w1, l,FrEp e w2)∈E′},

I′

e ={(&m, w)|(Hubpv&m, ε,FrEpe w)∈E′},

O′

e ={(w,&m)|(FrEpe w, ε,Hubpv&m)∈E′},

G′

e = (Ve′, Ee′, Ie′, O′e)

9 > > > > > > > > = > > > > > > > > ;

where (Va, Ea, Ia, Oa) =Ga

G =fwd eachedge(ρ, tb,(Va, Ea, Ia, Oa))

merge(ρ, ta,(Va, Ea, Ia, Oa),R) =taρ_G′

a

⊎ρU{ρ′e\ {$l→ } \ {$g→ } |(e, ρ′e)∈ R} where Eeps ={(u, ε, v)|(u, ε, v)∈Ea}

(V′′

e, E′′e) = (Ve′∪ {u}, Ee′∪ {(u, ρ′e($l),root(Ie′))}) for each e= (u, a, v)∈Eawitha=ε,

(e, ρ′

e)∈ R,(Ve′, Ee′, Ie′, O′e) =ρ′e($g)

G′

a = (SVe′′, Eeps∪SEe′′, Ia, Oa)

Figure 11: Core of the Backward Semantics ofrecat Code Position p

The composition by composerec produces two types of nodes. One is Vbody, which is the set of nodes v of of each graph fragments Ge to be combined. Note that, however, in order to keep the traceability, we augment the structured node ID ofv asFrEp e v recording the edge code IDp of the rec expression and the input edge e where the node is created. Of course, in the case of nested rec expressions, the ID v from the result of body expression themselves may be structured already. Similarly, the set of edges Ebody consists of edges fromGe with structured ID information added. The other type of nodes isVhub, which is used as connecting points ofGe’s. For instance, let e1 = (v, a, u) and e2 = (u, b, w) sharing the node u, and the recursion uses two markers &z1 and &z2. To connect Ge1 and Ge2 correctly,

nodes with output marker (x1,&z1) in Ge1 must be identified with nodes

with input marker (&z1, x2) in Ge2 (and similarly for &z2). To achieve this,

we prepare a node (Hubp v&z1) inVhub, and connect all the output marker nodes of Ge1 and the input marker nodes of Ge2 using ε-edges, which are

(28)

Next, our backward semantics is defined as follows:

rec(λ($l,$g). tb)(ta)ρ_G′ =

merge(ρ, ta, Ga,

bwd eachedge(ρ, tb, Ga,

decomprec(ρ, tb, G

′_{, G}

a, M))) where M =inMarker(tb)∪outMarker(tb)

Ga = [[ta]]ρ

Note the duality between the forward semantics. Backward semantics first decomposes bydecomprecthe modified result graphG

′ _{into pieces of graphs,}

which is intuitively an inverse operation ofcomprec. Each element of the de-composition is the (possibly modified) subpartG′_e ofG′, which corresponds to the forward computationGe. Then, inbwd eachedge, we carry out back-ward computation on each edge and compute the updated environmentρ′e. Finally, these environments aremerged into the updated environmentρ′ of the whole expression. The merge function does two works. First, by comb-ing the information ρ′e($l) and ρ′e($g) from the updated environments, it computes the modified argument graphG′_a4. Then we inductively carry out backward evaluation on the argument expression to obtain another updated environmentρ′

a. Thisρ′a and allρ′es are merged into ρ′.

Let us explain in more detail the definition of decomprec, which is the key point of the backward evaluation.

The function first extract from result graph G′ _nodes_V′

e and edges Ee′ that belonging to each edge e by matching structured ID FrEp e . Note that we require nodes that are newly inserted to the view also have this structure, so that these nodes are also passed to the backward evaluation of the recursion body. Input and output nodes with marker&m are recovered by selecting those pointed from/to hub nodes having structureHub &m. Top-level constructors of structured ID are erased so that we can inductively compute the backward image from the body expression.

Note that semantics uses node identities in computation, while graph data model assumes value equivalence based on bisimulation. Update de-tection or conflict resolution in ⊎ρ basically uses node identities, however, bisimulation equivalence may be used to resolve conflicts at the graph level instead of per node and edge basis, to cope with complex updating on the view.

4_{Merging operation in}S

(29)

Example 8 (Edge Renaming). The following transformation a2b replaces edge label awithband leaves other labels unchanged.

a2b =

rec(λ($l,$G).&z:=

if$l =athen {b: (&z)₁}2

else {$l : (&z)₃}4)($db)6

Above, expression t at code position p is written by tp. Let $db, which represents a source database, be the following graph.

s='&%$ !"#₁

c

(

a 6₆'&%$ !"#2

Then, the transformation result withε-edges and structured IDs is as follows.

G=

'& %$ !Hub6 1"#

,_,

'& %$

!FrE6 (InC 4) (1,a,2)"#

c

'& %$

!FrE6(InC 2) (1,c,2)"#

b

'& %$

!FrE6 (InC 3) (1,a,2)"#

'& %$

!FrE6(InC 1) (1,c,2)"#

r

'& %$ !Hub6 2"#

The graph is bisimilar to the following graph.

c

%

b 9₉

According to the backward semantics of the transformation a2b, if we modify graphGtoG′ by changing the edge labelctod, thena2b{_G$′db→s} returns binding{$db→s′} where

s′ ='&%$ !"#₁

d

(

a 6₆'&%$ !"#2.

The edge cin the source is changed to dsuccessfully. If the subgraph in the view is changed as

'& %$

!FrE6(InC 2) (1,c,2)"#

b

'& %$

!FrE6(InC 1) (1,c,2)"#

−→

'& %$

!FrE6 (InC4) (1,c,2)"#

e

'& %$

!FrE6 (InC3) (1,c,2)"#

then, according to the bidirectional semantics ofif, the source is changed to the following graph.

'&%$ !"#₁

c

(

(30)

Example 9 (Extracting Subgraph). The following transformation at ab

extracts all the subgraphs that can be reachable from the root by path a.b.

at ab =

rec(λ($l,$g). if$l =athen

rec(λ($l′,$g′).

if$l′ =bthen$g′ else{}1)($g)2 else{}3)($db)4

Then, we can reflect any changes on the extracted graphs to the correspond-ing source as long as the changed graph has appropriate structured IDs.

Let $db be graph /.-,()*+₁ a _/.-,_()*+_//₂ b/_/_G. _{Then the reachable part of the}

trans-formation results with structured ID becomes

'& %$

!Hub4 1&"# //'& !FrE4 (Hub2 2&) (1,a,2)"#%$ //G′

where subgraph G′ is obtained by replacing node v in G with

FrE4 (FrE2 v (2,b, r)) (1,a,2) where r is the root of G. Any modifi-cation on G′ can be reflected to Gin the source.

Example 10 (Customer2Order). The transformation mentioned in Sec-tion 2.3 is translated into UnCAL expression that has nesting ofrecsimilar to that in Example 9. Concrete UnCAL expression can be seen in Sec-tion C in the appendix. Since extracted subgraph can accept arbitrary updates, if we modify the label 16/10/2008 of the edge from node 7 to node 6 in graph gord in Figure 6, backward transformation systematically finds the corresponding edge and modify the edge from node 7 to node 6 in gcus in Figure 5, because the modified part is an extracted subgraph reachable from the root by pathcustomer.order.date. Subgraphs that are reached by the paths customer.order.no, customer.order.order of.name

and customer.order.order of.addcan be updated similarly.

4.5 Insertion Reflection based on ε-edge/ID Structure

So far the bidirectional semantics can only cope with in-place updating and deletion – modifications on edge labels or updating and deletion on extracted subpart of the original graph. However, except for the extracted subpart, it has the great limitation that no edge can be added to the original graph because it tightly uses the structures of ε-edges/IDs, which also enables us the backward semantics. In this section, we show how to extend it moderately to coped with insertion.

(31)

a inserted subpart. Matsuda et al. (2007) allow insertion only if the “origi-nal” data is uniquely determined by the inserted subpart. Liu et al. (2007) treat insertion only on the results of “map” but there is no guarantee of bidirectional properties in the framework.

Our insertion handling is inspired by those in Liu et al. (2007) and Matsuda et al. (2007). Like Liu et al. (2007), we treat insertion only on the result of rec. And, we follow Matsuda et al. (2007), we use ε-edge/ID structure as “complementary” information to uniquely create source data.

'& %$ !Hub6 1"#

'& %$

!FrE6 (InC 4) (1,a,2)"# d

'& %$

!FrE6 (InC 3) (1,a,2)"#

'& %$ !Hub6 2"#

Let us explain our basic idea with Example 8. If $db

is a graph/.-,()*+₁ d /_/_/.-,_()*+_{2 , then the transformation results in}

the graph on the right. In our insertion handling, we consider what happens if there exists an extra edge between two nodes, i.e., the transformation result of the following graph.

'&%$ !"#₁

c

(

??? 6₆'&%$ !"#2

Instead of inserting single edge, we consider the transformation result of the graph with inserted edge (1,???,2) that have undetermined label. Con-cretely, the input of the backward transformation with insertion handling is the following graph.

'& %$ !Hub6 1"#

,_,

'& %$

!FrE6(InC 4) (1,a,2)"#

c

'& %$

!FrE6 (InC2) (1,???,2)"#

b

'& %$

!FrE6(InC 3) (1,a,2)"#

'& %$

!FrE6 (InC1) (1,???,2)"#

r

'& %$ !Hub6 2"#

According to the semantics of rec, insertion of the FrE-subpart implies insertion of the edges. The structured ID in the inserted graph contains

???because at the time we do not know what is the edge that generates the inserted subgraph. Hence, by the backward transformation discussed here, we estimate a concrete label of the inserted edge. Because of the structured ID of the inserted subgraph, we can know that inserted subgraph comes from

true-branch of the following if-expressionif$l =athen{b:&1}2 else{$l :

&3}4 Since condition $l = amust be true to obtain the inserted subgraph, the backward transformation replaces the label ??? of $l by a. Hence, we obtain the following graph by the insertion.

'&%$ !"#₁

c

(

a 6₆'&%$ !"#2

(32)

new semantics ofif estimates a concrete label to replace???. After that, we discuss expressive power of our insertion reflection.

Note that ??? may cause the failure of the forward transformation; if it appears in the condition of if-expression, we cannot determine which branch to use. Recall that some backward semantics of an expression uses the forward semantics of its subexpression. Note that the failure of forward transformation do not mean the failure of the backward transformation immediately. When they are used for graph decomposition in a backward transformation, the backward transformation fails.

4.5.1 Backward Semantics of rec for Insertion Reflection

The modification on the recfor insertion reflection is very small. Since the IDs of an inserted subgraph must have the formFrE ( ,???, ), we can easily split the inserted subpart from a modified result ofrec. The definition ofdecomprec in Figure 11 changes a bit as follows.

decomprec(. . .(V

′_{, E}′_{, I}′_{, O}′₎_{. . .}_{) =}

. . .e∈E_a. . . ρ_e=ρ{. . .$g→subgraph(G_a, e)

where . . .

−→

decomprec(. . .(V

′_{, E}′_{, I}′_{, O}′₎_{. . .}_{) =}

. . . e∈E_a′ . . . ρ_e=ρ{. . .$g→ subgraph(G′_a, e)

where E_a′ =Ea

∪ {(u,???, v)|FrEp (u,???, v)∈E′}

G′

a = (Va, Ea′, Ia, Oa) . . .

The subgraph is performed on G′

a, which contains edge (u,???, v), instead ofGabecause to guarantee PutGetwe must ensure that the inserted edge does not affect other part of the output graph (V′, E′, I′, O′).

4.5.2 Estimating Label of Edge to be Inserted

There are only two places that backward transformation can replace ???

with a concrete value.

• Use of label variable $l caused by rec.

• Condition such as $l =ainif.

(33)

Here, we only write the formal definition of the backward semantics ofif where the evaluation result of condition is undetermined because of variable $l with ???in the expression.

ifbthent1 elset2ρ_G=

  

 

Fail if [[b]]ρ′′i ₌_???_for _i_{= 1}_,₂

ρ′′

1 if [[b]]ρ

′′ 1 =_true ρ′′₂ if [[b]]ρ′′2 ₌_false

where ρ′1 =t1ρ_G

ρ′₂ =t2ρ_G

ρ′′

1 =refine(ρ′1, b)

ρ′′2 =refine(ρ′2,notb)

Above, refine(ρ, b) is a function to be used to refine the environment ρ by replacing???with a concrete value so as to make the evaluation resultbto betrue.

There are many choices of refine(ρ, b). A strong approach would be to use some constraint solver. Instead, we take a more lightweight choice that cares a condition of the form $l = v or $l = v and fixes the value of $l even though there are multiple possibilities. For example, refine({$l →

???},$l = a) returns {$l = a} while refine({$l → ???},$l = a) returns

{$l →bogus} wherebogusis magically chosen value to suffice $l =a. Note that refine({$l → ???,$r = a},$l = $r) returns {$l → a,$r = a} because we can know the concrete value of $r. To guarantee PutGet,refine fails if it encounters expression such as $l = $r where both $l and $r are bound to

???.

4.5.3 Discussion of Insertion Reflection

As we have shown with Example 8, the insertion reflection method here work well for single rec. However, the insertion reflection method does not work for the nestedrecsuch as that in Example 9.

The main reason of the limitation is that the one graph can be traversed more than once by nestedrec; since the backward transformation reflect the concrete graph to the source by each traverse, the reflection would conflict for each traverse.

Let us consider the transformation of Example 9. Let $db be a graph with isolated three nodes /.-,()*+_{1 ,} /.-,()*+_{2 and} /.-,()*+_{3 . Then, the transformation result}

consists of three hub nodes of the form Hub4 & but does not have any edge. If we want to insert something to the view, we must inserta.bpath to the source.

To inserta.b, we consider the transformation result of $db='&%$ !"#₁???_!"#/_'&%$_/₂???/_/_'&%$_!"#₃

(34)

following graph.

'& %$

!&:Hub4 1&"#

'& %$

!FrE4(Hub2 2&) (1,???,2)"#

'& %$

!FrE4 (FrE2 3 (2,???′,3)) (1,???,2)"#

'& %$

!FrE4(Hub2 3&) (1,???,2)"#

'& %$ !Hub4 2&"#

'& %$

!FrE4(InC 3) (2,???′,3)"#

'& %$ !Hub4 3&"#

Here, for presentation, we denote???to be replaced byaand bby ???and

???′ respectively.

However, the insertion reflection fails due to the following two problems.

• The backward transformation of the outer rec calls the backward transformation ofrec-body forFrE4 (1,???,2) nodes. However, the backward transformation ofrec-body fails because of mismatch of hub nodes of the innerrec. The Vhub of the inner recin the original data is Hub2 2 & (hub nodes of reachable part of node 2), while the hub nodes of the output graph areHub2 2 &and Hub2 3 &.

• The backward transformation of the outer rec tries to estimate the label of inserted edge. If the first problem is solved well, we can estimate the label of the edge (1,???,2) from the condition $l = a. However, the estimation of the second edge (2,???,3) by the outerrec can replace ??? with label different from b, which causes the failure by inconsistent environement in the later step.

There would be many approaches towards the problems. Preprocessing UnCAL expression using fusion method to flatten nestedrecto one flatrec avoids the problems. However, if the transformation has “join”, in which innerrecuses bound variable of outerrec, it is hard to flatten nestedrecs. Even when the fusion method is not applicable, the first problem would be relatively easy to solve. If the view contains an extra hub nodeHubp v , we can safely assume that the node comes from the new or existing node v. Since the original graph is available in backward transformation, we can know whether a node is new or existing. On the contrary, more discussion is needed to solve the second problem. The second problem seems to be solved by changing backward semantics to calculate bindings that maps variable to a set of values and with constraints on the variables. However, in addition to the problem of choosing appropriate description of suchsets, it makes the guarantee of PutGetproperty much and much harder.

(35)

• Environment merging operator ⊎must consider the “sets” and “con-straints”.

Note that approximation by wider (less constrained) set may not work. Since the sets and the constraints represents the set of sources of an output graph, the approximated set might be too wide to guaranteePutGet.

One may think that it is also a problem that the insertion here does not consider insertion of nodes. However, in contrast to the above two problems, the insertion of node is easily solved because an isolated node in the source is always transformed to an isolated hub node in the view by rec. However, because the first problem above, we can use the inserted node only in outermostrec.

Since the essence of difficulties of insertion reflection in Customer2Order that appeared in Section 2.3 is contained in insertion reflection in Example 9, we hardly insert subgraphs in Customer2Order because of the same reasons. To give an appropriate and reasonable solution to the problem is one of our future work.

4.6 Main Theorem

Now that semantics of all UnCAL expressions are defined, we summarize our main result as the following theorem. See section A in the appendix for the proof of the theorem.

Theorem 2. Every bidirectionalized UnCAL expression satisfies(PutGet)

and(GetPut)properties, provided that its forward and backward evaluation succeeds.

5 Implementation and Experiments

The prototype system has been implemented in OCaml and is available online5. One can run UnQL queries both forwardly and backwardly. In addition to the examples in Buneman et al. (2000), we have tested the following three non-trivial examples, showing its usefulness in software engineering and database management.

• Customer2Order: a case study in the textbook on model-driven soft-ware development (Pastor and Molina 2007).

• PIM2PSM: a typical example of transforming a platform independent object model to platform specific object model.

• Class2RDB; a non-trivial benchmark application for testing power of model transformation languages (Bezivin et al. 2005).

5

(36)

All of them demonstrate the effectiveness of our approach in practical applications.

In our implementation, we carefully treat ε-edges, unreachable parts introduced during operations related to markers, and retrieval of edges or nodes of interest, which greatly affect the performance. Bad treatment would hinder large scale UnQL queries to evaluate in bidirectional mode6 _{in a} reasonable amount of time. Several orders of magnitude speed-up has been achieved since our initial implementation by the following optimizations.

Reduction of the number of ε-edges As mentioned in the UnQL

paper (Buneman et al. 2000), ε-edges are generously generated during evaluation, especially inrec. It makes evaluation process slow due to growth of input size. Removingε-edges during evaluation does no harm on forward semantics because of bisimulation equivalence. However, since ε-edges play an important role in backward evaluation, they are not freely omitted in our bidirectional settings. Moreover, a straightforward implementation of the removal algorithm (Buneman et al. 2000) may introduce additional edges, which may harm backward evaluation. Towards prudent removal ofε-edges suitable for backward evaluation, our ε-removal algorithm glue source and destination nodes ofεas long as bisimulation equivalence is not violated.

Pruning of unreachable nodes @andrecmay leave unreachable nodes

if some input and output nodes are left unconnected due to mismatch of markers. This mismatch happens typically in the case of projection of graph components &zi by idiom &zi@g and in the case of @ in the definition of rec in rec(λ($l,$g).tb)(l:g) = tb(l, g) @rec(tb)(g). Typical usage of rec includes tb with no output marker, where actual recursion below the first level from the input nodes — subgraphs that is not reachable by traversing only one edge — is not necessary because the @does not connect the first and second arguments due to the lack of matching markers. This opens optimization opportunities; not to evaluate tb for the input graph that is lower than the first level. Performance improvement was significant for the forward evaluation.

Indexing As seen in the formal notion of bulk semantics composerec and

decomprec, pattern matching on edges for the entire graph is often needed. Constructing maps from node identifiers to the sets of associated incoming and outgoing edges has been effective to improve performance on both forward and backward directions.

6 Related Work

In addition to the related work in the introduction, we discuss other most related work.

(37)

Bidirectional transformation has been discussed as view updating prob-lem in the database community. Bancilhon and Spyratos (1981) proposed a general approach to the problem. They introduced an elegant solution based on the concept of constant complement view which enables recovery of lost information of the sources. It has been applied to bidirectional transforma-tion on relatransforma-tional database (Cosmadakis and Papadimitriou 1984; Laurent et al. 2001; Lechtenb¨orger and Vossen 2003) and tree structures (Matsuda et al. 2007). However, this approach assumes the presence of automatic in-verse transformation, which makes it difficult to use.

Foster et al. (2005) and Bohannon et al. (2008) proposed the first linguis-tic approach to bidirectional transformation. They developed some domain specific languages for supporting development of bidirectional transforma-tion on strings and trees. However, their approach is limited to strings and trees, and difficult to be applied to graph transformation due to graph-specific features such as circularity and sharing.

In the context of software engineering, there are several work on bidirec-tional model (graph) transformation (Ehrig et al. 2005; OMG 2005; Jouault and Kurtev 2006), which can deal with a kind of graph structures. As pointed by Stevens (2007), there lacks clear formal semantics of bidirectional model transformation and there is no powerful bidirectionalization method yet that can be used to automatically derive backward model transformations from forward model transformations so that both transformations form a consis-tent bidirectional model transformation. This calls for a more serious study on bidirectional graph transformation, which actually inspired the work in this paper.

The concept of structural recursion is not new and has been studied in both the database community (Breazu-Tannen et al. 1991) and the functional programming community (Sheard and Fegaras 1993). However, most of them focused on structural recursion over lists or trees instead of graphs. Examples include the higher order function fold (Sheard and Fegaras 1993) in ML and Haskell, and the generic computation pattern called catamorphism in programming algebras (Bird and de Moor 1996). It is UnCAL (Buneman et al. 2000) that shows the idea of structural recursion can be extended to graph, but the focus there was on query fusion optimization rather than bidirectionalization.

テクニカルレポート | GRACEセンター

GRACE TECHNICAL REPORTS

Bidirectionalizing Structural Recursion on

Graphs

Soichiro Hidaka Zhenjiang Hu Kazuhiro Inaba

Hiroyuki Kato Kazutaka Matsuda Keisuke Nakano

Bidirectionalizing Structural Recursion on Graphs

put

modify

t’

get

t

s’

s

1

Introduction

2

Graph Transformation in UnQL

3

UnCAL: An Internal Graph Algebra

4

Bidirectionalization of UnCAL

5

Implementation and Experiments

6

Related Work

7

Conclusion